Merge Path – Cache-Efficient Parallel Merge and Sort

نویسندگان

Saher Odeh

Oded Green

Yitzhak Birk

چکیده

Merging two sorted arrays is a prominent building block for sorting and other functions. Its efficient parallelization requires balancing the load among compute cores, minimizing the extra work brought about by parallelization, and minimizing inter-thread synchronization requirements. Due to the extremely low compute to memoryaccess ratio, it is also critically important to efficiently utilize the memory system: minimize memory traffic, maximize the cache hit rate and minimize cache-coherence related activity. We present a novel approach to partitioning the two sorted arrays into pairs of contiguous sequences of elements, one from each array, such that 1) each pair comprises any desired total number of elements, and 2) the elements of each pair form a contiguous sequence in the final merged sorted array. While the resulting partition and the computational complexity are similar to those of certain previous algorithms, our approach is different, extremely intuitive, and offers interesting insights. Based on this, we present a synchronization-free, cacheefficient merging (and sorting) algorithm. While we use CREW PRAM as the basis, our algorithm is easily adaptable to additional architectures. In fact, our approach is even relevant to sequential cache-efficient sorting. The new algorithm has been implemented both on the HyperCore many-core sharedcache architecture and on a sizable x86 system, with emphasis on cache efficiency. The algorithms and performance results are presented, along with important cache-related insights. Keywords-component; Cache Memories; Parallelism and concurrency; Parallel processors; Sorting and searching

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Merge Path – Cache-Efficient Parallel Merge and Sort

متن کامل

Efficient Oblivious Parallel Sorting on the MasPar MP-1

We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of moderate size P where the processing elements (PEs) are interconnected as a toroidal mesh and have 16KB local storage each. We present a comparative study of implementations of the following deterministic oblivious sorting methods: Bitonic Sort, Odd-Even Merge Sort, and FastSort. We successfully...

متن کامل

Proficient Pair of Replacement Algorithms on L1 and L2 Cache for Merge Sort

Memory hierarchy is used to compete the processors speed. Cache memory is the fast memory which is used to conduit the speed difference of memory and processor. The access patterns of Level 1 cache (L1) and Level 2 cache (L2) are different, when CPU not gets the desired data in L1 then it accesses L2. Thus the replacement algorithm which works efficiently on L1 may not be as efficient on L2. Si...

متن کامل

Geometric Algorithms for Private-Cache Chip Multiprocessors

We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2-D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort ...

متن کامل

Sorting on a Massively Parallel System Using a Library of Basic Primitives: Modeling and Experimental Results

We present a comparative study of implementations of the following sorting algorithms on the Parsytec SC320 reconfigurable, asynchronous, massively parallel MIMD machine: Bitonic Sort, Odd-Even Merge Sort, Odd-Even Merge Sort with guarded split&merge, and two variants of Samplesort. The experiments are performed on 2up to 5-dimensional wrapped butterfly networks with 8 up to 160 processors. We ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Merge Path – Cache-Efficient Parallel Merge and Sort

نویسندگان

چکیده

منابع مشابه

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Merge Path – Cache-Efficient Parallel Merge and Sort

Efficient Oblivious Parallel Sorting on the MasPar MP-1

Proficient Pair of Replacement Algorithms on L1 and L2 Cache for Merge Sort

Geometric Algorithms for Private-Cache Chip Multiprocessors

Sorting on a Massively Parallel System Using a Library of Basic Primitives: Modeling and Experimental Results

عنوان ژورنال:

اشتراک گذاری